refactor(session_store): thread base_dir + KeyringMode through API (#34) by hanwencheng · Pull Request #43 · litentry/agentKeys

hanwencheng · 2026-04-16T03:58:57Z

Summary

Option 2 follow-up to #42 for issue #34. Replaces the implicit \$HOME / AGENTKEYS_SESSION_STORE env-var reads at every session_store::* call with an explicit SessionStore handle that owns base_dir + KeyringMode. Tests construct one rooted at a per-test tempdir in file-only mode — no more unsafe { set_var }, no #[serial], no process-global races, no new dev-deps.

Why this on top of #42

#42 patched the symptom with #[ctor] + #[serial], which works but:

Still compiles unsafe { set_var } into the test binary (Rust 2024 flags this as unsound)
Requires every future HOME-touching test to remember #[serial] — silent flake if forgotten

This refactor removes the env-mutation pattern entirely. Tests can't reintroduce the race because the test helpers no longer expose HOME as a knob. It also unlocks a real product feature: callers can point the CLI at a non-default base directory without touching the process environment (future `AGENTKEYS_HOME` / `--data-dir` / container deploys).

Changes

`agentkeys-core/src/session_store.rs` — introduce `SessionStore` struct with `base_dir` + `KeyringMode`, with `new` / `file_only` / `from_env` constructors and method-form `save` / `load` / `clear` / `list_ids` / `session_path` / `load_with_legacy_fallback`. Existing free functions are preserved as thin wrappers that delegate to `SessionStore::from_env()`, so the daemon and any external callers keep working unchanged. Internal `marker_path` and file-I/O helpers are now methods on `SessionStore`. The module's own tests drop the `with_temp_home` mutex + `unsafe { set_var }` helper in favor of a per-test `SessionStore::file_only(tmp)`.

`agentkeys-cli/src/lib.rs` — add optional `session_store_override` field on `CommandContext` and a `.with_session_store(store)` builder. New `session_store()` accessor returns the override or `SessionStore::from_env()`. The five `session_store::*` call sites in `cmd_init`, `cmd_revoke` (x2), `cmd_recover`, and `CommandContext::load_session` now route through it.

`agentkeys-cli/tests/cli_tests.rs` — delete all 13 `unsafe { set_var }` calls and all `HOME` mutation. Every test owns a `(SessionStore, TempDir)` pair and threads the store through context builders.

Test plan

`cargo test --workspace` — all 151 tests pass (agentkeys-types 3, agentkeys-core 22, agentkeys-mock-server 56, agentkeys-cli 37, agentkeys-daemon 15, agentkeys-mcp 15, agentkeys-provisioner 3)
`cargo test -p agentkeys-cli` × 5 consecutive runs, parallel scheduler — 37/37 each time, 0.21s–0.68s
`cargo clippy -p agentkeys-core -p agentkeys-cli --tests` — no new lints (pre-existing warnings in unrelated files unchanged)
Workspace daemon unchanged — production call sites still resolve through the legacy free-function wrappers

Relationship to #42

Alternative approach — either can land. If #42 lands first, this PR will need a trivial rebase to delete the `ctor` / `serial_test` dev-deps and the `#[ctor::ctor]` block. If this lands first, #42 can be closed as superseded.

Closes #34.

Option 2 follow-up to #42. Replaces the implicit `$HOME` / `AGENTKEYS_SESSION_STORE` env-var reads at every call with an explicit `SessionStore` handle that owns both. Tests construct one rooted at a per-test tempdir in file-only mode — no more `unsafe { set_var }`, no `#[serial]`, no process-global races. Changes - agentkeys-core/src/session_store.rs: introduce `SessionStore` struct with `base_dir` + `KeyringMode`, expose `new`, `file_only`, `from_env` constructors and method-form save/load/clear/list/session_path. Existing free functions (`save_session`, `load_session`, `clear_session`, `fallback_path`, `list_fallback_session_ids`, `load_session_with_legacy_fallback`) are preserved as thin wrappers that delegate to `SessionStore::from_env()`, so the daemon and any external callers keep working unchanged. Internal `marker_path` and file-I/O helpers are now methods on `SessionStore`. - agentkeys-core session_store tests: drop the `with_temp_home` mutex + `unsafe { set_var }` helper and rewrite to build a per-test `SessionStore::file_only(tmp)` directly. Fully parallel, fully hermetic. - agentkeys-cli CommandContext: add optional `session_store_override` field and `.with_session_store(store)` builder. New `session_store()` accessor returns the override or `SessionStore::from_env()`. The five `session_store::*` call sites in `cmd_init`, `cmd_revoke` (x2), `cmd_recover`, and `CommandContext::load_session` now route through it. - agentkeys-cli integration tests: delete all 13 `unsafe { set_var }` calls and all HOME mutation. Every test owns a `(SessionStore, TempDir)` pair and threads the store through context builders. Drops the flaky race surface (issue #34) by construction rather than by serialization. Why this on top of #42 - #42 patched the symptom with `#[ctor]` + `#[serial]`, which works but still compiles `unsafe { set_var }` into the test binary and requires future HOME-touching tests to remember `#[serial]`. This refactor removes the env-mutation pattern entirely — tests can't reintroduce the race because the test helpers no longer expose HOME as a knob. - Also unlocks a genuine product feature: callers can point the CLI at a non-default base directory without touching the process environment (future `AGENTKEYS_HOME` / `--data-dir` / container deploys). Test plan - `cargo test --workspace` — all 151 tests pass - `cargo test -p agentkeys-cli` × 5 consecutive runs, parallel scheduler — 37/37 each time, ~0.2–0.7s - `cargo clippy -p agentkeys-core -p agentkeys-cli --tests` — no new lints (pre-existing warnings in unrelated files unchanged) Closes #34.

…n-by-construction (codex PR #43 [P2]) Codex /codex review on PR #43 flagged that `SessionStore::new(base_dir, KeyringMode::Auto)` was unsafe: keyring entries are keyed on `session_id` alone and ignore `base_dir`, so two stores at different roots sharing a `session_id` would silently alias through the OS keychain. The newly introduced API promised per-root isolation but the `Auto` path violated that promise. Fix: remove the public `SessionStore::new(base_dir, mode)` constructor. The only public constructors are now: - `SessionStore::file_only(base_dir)` — explicit custom root, always file-only, safe by construction - `SessionStore::from_env()` — home-rooted (the single-root invariant the keyring namespace assumes), may select `KeyringMode::Auto` Custom-root callers physically cannot opt into keyring aliasing. No callers (internal or external) used `::new` with `KeyringMode::Auto` and a custom `base_dir`, so no migration is needed. Tests: `cargo test -p agentkeys-core -p agentkeys-cli` → 22 core + 37 cli, 0 failures, 0.78s. Refs #34, addresses codex [P2] on PR #43.

hanwencheng · 2026-04-16T05:22:49Z

@claude review the PR

claude · 2026-04-16T05:23:13Z

Claude finished @hanwencheng's task in 2m 10s —— View job

PR Review: `refactor(session_store): thread base_dir + KeyringMode through API (#34)`

Verdict: Approve with suggestions — This is a well-motivated, clean refactor. The core design is sound, and the test migration is the right call over the ctor+serial_test approach in #42. A few issues worth addressing before merge.

Bugs / Correctness

1. Redundant self.load() at the end of load_with_legacy_fallback

crates/agentkeys-core/src/session_store.rs:242

pub fn load_with_legacy_fallback(&self, session_id: &str) -> Result<Session> {
    if let Ok(s) = self.load(session_id) {
        return Ok(s);  // Fast path — already failed here
    }
    if session_id == "master" {
        // ...legacy paths...
    }
    self.load(session_id)  // ← Will always fail; same as the initial load
}

After the initial self.load fails and all legacy fallbacks are exhausted, calling self.load again is redundant — it will produce the same error as the first call, just with a less useful message (no mention of the legacy paths that were tried). Replace with a direct Err:

Err(anyhow::anyhow!("no session found for id={session_id} (and no legacy fallback matched)"))

Fix this →

2. list_ids returns sanitized names — round-trip list_ids → load silently double-sanitizes for unusual session IDs

crates/agentkeys-core/src/session_store.rs:194-215

Directory names on disk are the sanitized forms. list_ids returns them verbatim, but load(returned_id) re-applies sanitize_for_keyring — meaning any session ID that was rewritten (e.g. foo/bar → __agk_foo_bar-cc5d46bd) would be double-sanitized and the load would fail. In practice this is harmless because all real session IDs (master, daemon-0xABC, etc.) are valid under the sanitizer. But this is a latent correctness hazard. At minimum add a comment:

// NOTE: returned names are the on-disk (sanitized) forms. For all production
// session IDs (alphanumeric + dash) sanitize_for_keyring is a no-op, so these
// can be passed directly to load(). If you ever call list_ids on a store that
// might hold rewritten IDs, be aware of this.

Design Concerns

3. session_store() re-reads env vars on every call

crates/agentkeys-cli/src/lib.rs:110-114

pub fn session_store(&self) -> SessionStore {
    self.session_store_override
        .clone()
        .unwrap_or_else(SessionStore::from_env)  // reads $HOME on every call
}

cmd_revoke calls ctx.session_store() twice in the same function body (once for each branch). from_env re-reads $HOME and AGENTKEYS_SESSION_STORE each time and allocates a new PathBuf. This is cheap in wall-clock terms, but it means the effective store can differ between calls within a single command if the env is mutated concurrently (unlikely, but semantically surprising).

The simpler fix is to resolve from_env once at construction time and store it in session_store_override unconditionally:

pub fn new(backend_url: &str, verbose: bool, json_output: bool) -> Self {
    Self {
        // ...
        session_store_override: Some(SessionStore::from_env()),  // eager, once
    }
}

Then with_session_store just overrides it, and session_store() always returns a &SessionStore reference (or clone of the stored value). This removes the conceptual split between "override" and "default" entirely.

4. session_store_override is pub while a builder pattern exists

crates/agentkeys-cli/src/lib.rs:54

pub session_store_override: Option<SessionStore>,

External code can bypass the builder and directly mutate or inspect this field. Since the other override fields (session_override, backend_override) are also pub, this is consistent with the existing pattern — but it's worth considering pub(crate) for all three to make the builder the sole external API.

Style / Minor

5. _session naming in start_scope_test_server — used variable should not have leading underscore

crates/agentkeys-cli/tests/cli_tests.rs:799

let (_output, _session) = cmd_init(...).await.unwrap();
// ...
.header("authorization", format!("Bearer {}", _session.token))

_session is actively used on the very next line. Leading underscores conventionally signal "intentionally unused." Rename to session to match intent.

Fix this →

6. Legacy free functions have no deprecation marker

crates/agentkeys-core/src/session_store.rs:326-359

The PR preserves fallback_path, save_session, load_session, etc. as thin wrappers and the docs say "new code should prefer building a SessionStore explicitly." A #[deprecated] annotation would enforce this at the compiler level and prevent new callers from accidentally using the old API:

#[deprecated(since = "0.2.0", note = "Use SessionStore::from_env().session_path() instead")]
pub fn fallback_path(session_id: &str) -> PathBuf {
    SessionStore::from_env().session_path(session_id)
}

What's Done Well

The isolation-by-construction invariant (file_only ↔ custom root, Auto ↔ home root only) is clearly documented and enforced structurally. The comment at lines 62–71 is the right level of explanation for a subtle correctness property.
Test hermetic isolation is now correct: each test owns its TempDir and SessionStore, no shared global state, no #[serial] required.
Path traversal defense (sanitize_for_keyring + session_dir routing through it) is sound and well-tested.
Legacy wrappers preserving the existing daemon/CLI interface is the right call for a refactor of this scope.
The clear synchrony fix (keyring delete blocks before returning) and the list_ids sort are important correctness properties that survive the refactor intact.

…k fallthrough (claude PR #43 review #1) The trailing `self.load(session_id)` at the end of `load_with_legacy_fallback` was redundant — the same call already failed at the top of the function, so re-running it produced the same error with no added context about which legacy fallbacks were tried. Replace with `anyhow::bail!` that names the attempted session path, the keyring mode, and the fact that legacy fallbacks only trigger for `id="master"`. Debuggability win; same observable behavior (still returns Err). Tests: `cargo test -p agentkeys-core -p agentkeys-cli` → 22 + 37, 0 failures.

hanwencheng added 2 commits April 16, 2026 11:58

hanwencheng merged commit 5f08790 into main Apr 16, 2026
1 check passed

This was referenced Apr 16, 2026

fix(tests): #34 race-free parallel cli tests via ctor + #[serial] #42

Closed

Stage 5a: Provisioner — deterministic + patterns (v0 critical path) #44

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(session_store): thread base_dir + KeyringMode through API (#34)#43

refactor(session_store): thread base_dir + KeyringMode through API (#34)#43
hanwencheng merged 3 commits intomainfrom
fix/issue-34-session-store-base-dir

hanwencheng commented Apr 16, 2026

Uh oh!

hanwencheng commented Apr 16, 2026

Uh oh!

claude Bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hanwencheng commented Apr 16, 2026

Summary

Why this on top of #42

Changes

Test plan

Relationship to #42

Uh oh!

hanwencheng commented Apr 16, 2026

Uh oh!

claude Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: refactor(session_store): thread base_dir + KeyringMode through API (#34)

Bugs / Correctness

Design Concerns

Style / Minor

What's Done Well

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Apr 16, 2026 •

edited

Loading

PR Review: `refactor(session_store): thread base_dir + KeyringMode through API (#34)`